18 research outputs found

    Comparing the hierarchy of keywords in on-line news portals

    Get PDF
    The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals

    Effects of time window size and placement on the structure of aggregated networks

    Full text link
    Complex networks are often constructed by aggregating empirical data over time, such that a link represents the existence of interactions between the endpoint nodes and the link weight represents the intensity of such interactions within the aggregation time window. The resulting networks are then often considered static. More often than not, the aggregation time window is dictated by the availability of data, and the effects of its length on the resulting networks are rarely considered. Here, we address this question by studying the structural features of networks emerging from aggregating empirical data over different time intervals, focussing on networks derived from time-stamped, anonymized mobile telephone call records. Our results show that short aggregation intervals yield networks where strong links associated with dense clusters dominate; the seeds of such clusters or communities become already visible for intervals of around one week. The degree and weight distributions are seen to become stationary around a few days and a few weeks, respectively. An aggregation interval of around 30 days results in the stablest similar networks when consecutive windows are compared. For longer intervals, the effects of weak or random links become increasingly stronger, and the average degree of the network keeps growing even for intervals up to 180 days. The placement of the time window is also seen to affect the outcome: for short windows, different behavioural patterns play a role during weekends and weekdays, and for longer windows it is seen that networks aggregated during holiday periods are significantly different.Comment: 19 pages, 11 figure

    Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches

    Get PDF
    We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three pre-defined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper networks in research fields.Comment: 18 pages, 9 figure

    Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics

    Get PDF
    Background: Network communities help the functional organization and evolution of complex networks. However, the development of a method, which is both fast and accurate, provides modular overlaps and partitions of a heterogeneous network, has proven to be rather difficult. Methodology/Principal Findings: Here we introduce the novel concept of ModuLand, an integrative method family determining overlapping network modules as hills of an influence function-based, centrality-type community landscape, and including several widely used modularization methods as special cases. As various adaptations of the method family, we developed several algorithms, which provide an efficient analysis of weighted and directed networks, and (1) determine pervasively overlapping modules with high resolution; (2) uncover a detailed hierarchical network structure allowing an efficient, zoom-in analysis of large networks; (3) allow the determination of key network nodes and (4) help to predict network dynamics. Conclusions/Significance: The concept opens a wide range of possibilities to develop new approaches and applications including network routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18 module definitions, 129 different modularization methods, 13 module comparision methods) and 396 references. All algorithms can be downloaded from this web-site: http://www.linkgroup.hu/modules.ph

    A survey of results on mobile phone datasets analysis

    Get PDF

    An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic

    No full text
    Community detection algorithms are widely used to study the structural properties of real-world networks. In this paper, we experimentally evaluate the qualitative performance of several community detection algorithms using large-scale email networks. The email networks were generated from real email traffic and contain both legitimate email (ham) and unsolicited email (spam). We compare the quality of the algorithms with respect to a number of structural quality functions and a logical quality measure which assesses the ability of the algorithms to separate ham and spam emails by clustering them into distinct communities. Our study reveals that the algorithms that perform well with respect to structural quality, don’t achieve high logical quality. We also show that the algorithms with similar structural quality also have similar logical quality regardless of their approach to clustering. Finally, we reveal that the algorithm that performs link community detection is more suitable for clustering email networks than the node-based approaches, and it creates more distinct communities of ham and spam edges
    corecore